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Abstract 


A characteristic of many proteoforms, derived from a single gene, is their 
similarity regarding the composition of atoms, making their analysis very chal- 
lenging. Many overexpressed recombinant proteins are strongly associated with 
this problem, especially recombinant therapeutic glycoproteins from large-scale 
productions. In contrast to small molecule drugs, which consist of a single defined 
molecule, therapeutic protein preparations are heterogenous mixtures of dozens or 
even hundreds of very similar species. With mass spectrometry, currently high- 
quality spectra of intact proteoforms can be obtained only, if the complexity of the 
mixture of individual proteoform-ions, entering the gas phase at the same time is 
low. Thus, prior to mass spectrometric analysis, an effective separation is required 
for getting fractions with a low number of individual proteoforms. This is especially 
true not only for recombinant therapeutic proteins, because of their huge hetero- 
geneity, but also relevant for top-down proteomics. Purification of proteoforms is 
the bottleneck in analyzing intact proteoforms with mass spectrometry. This review 
is focusing on the current state of the art, especially of liquid chromatography for 
preparing proteoforms for mass spectrometric top-down analysis. The topic of 
therapeutic proteins has been chosen, because this group of proteins is most chal- 
lenging regarding their proteoform analysis. 


Keywords: proteoforms, top-down mass spectrometry, therapeutic proteins, 
liquid chromatography, protein purification parameter screening, displacement 
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1. Introduction 


The analysis of proteoforms, often also termed protein species or isoforms, is 
the next level in proteomics. The first comprehensive definition of this subgroup of 
proteins was published by Jungblut et al. [1] and Schlüter et al. [2], using the term 
“protein species”. In 2013, Smith and Kelleher [3] introduced the term “proteoform’, 
which today is widely accepted in the community of proteomics experts. The 
concept of “proteoform” is nearly identical with the concept of “protein species”. 
The only difference is that the proteoform concept is gene-centric and the protein- 
species-concept is chemistry-centric. 
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For developing methods for comprehensive analysis of proteoforms, the group 
of therapeutic proteins is a suitable training area. Therapeutic proteins are known 
to be rich in the number of proteoforms. Although a therapeutic protein product 
is containing only trace amounts of impurities like host cell proteins, which are 
difficult to detect because of their very low concentration, the analysis of their 
proteoforms is very challenging because of their large number, their similarity and 
their low concentration compared to the main proteoform. 


2. Analysis of proteoforms: challenges 


The most common method in proteomics is the bottom-up or shotgun approach. 
It relies on the proteolytic cleavage of proteins by proteases like trypsin. The result- 
ing peptide mixture is subjected to liquid chromatography coupled to tandem mass 
spectrometry (LC-MS/MS) analysis. Proteins are identified from the LC-MS/MS 
data by comparing the peptide fragment spectra against in-silico fragment spectra 
generated from a protein database [4]. Asa rule of thumb, a protein is claimed to 
be identified, if at least two unique peptides are identified representing parts of the 
sequence. Thus, often a sequence coverage of 100% is not obtained. Consequently, 
if this is the case, it can be only stated that a product or several products (proteo- 
forms) of a defined gene has been identified. No information about the identity 
of the underlying proteoform is obtained. It can even be assumed that the identi- 
fied tryptic peptides may be products of several different proteoforms. For the 
characterization of a therapeutic protein, bottom-up proteomics is a standard 
method. The signals in the LC-MS chromatograms represent tryptic peptides of all 
proteoforms of the therapeutic protein. A defined tryptic peptide, which is present 
in all proteoforms, will form one single monoisotopic signal. Its signal intensity 
represents the sum of this peptide from the different species. The presence of an 
individual proteoform only can be detected, if this proteoform will yield a tryptic 
peptide, a defined phosphor-peptide, which is unique for this proteoform. However, 
it cannot be excluded, that there are several proteoforms containing that peptide. 
Asa result, bottom-up proteomics is helpful for getting LC-MS chromatograms 
which can be used as fingerprints of a therapeutic protein, but will give no infor- 
mation about the number and composition of proteoforms within the therapeutic 
protein product. The detection of a low abundant proteoform is especially difficult, 
since a unique tryptic peptide of such a proteoform is present in a low amount and 
thereby the signal in a bottom-up proteomics LC-MS chromatogram will have alow 
intensity. Thus, if the detection of different proteoforms is of interest, top-down 
mass spectrometry (TDMS) is the method of choice, because it utilizes the intact 
proteoform for analysis instead of proteolytic peptides. 

For performing a TDMS analysis, a purified individual intact proteoform is 
transferred into the MS. From the MS spectrum of the intact ions, the molecular 
weight can be determined. Various techniques are available for fragmentation of 
the intact proteoform such as HCD, CID, ETD, ETHcD, ECD, UVPD and IRMPD, 
yielding different types for fragments, which complement each other [5]. After 
fragmentation, the proteoform can be identified by interpreting the fragment 
spectrum. There are several software tools available for analyzing the TDMS intact 
data [6-8]. The review of Schaffer et al. is recommended as an introduction into 
TDMS [9]. Robust protocols for mass analysis of intact proteins with TDMS were 
recently published by Donnelly et al. [10]. TDMS is requiring sample mixtures of 
low complexity for obtaining high quality spectra of proteoforms. Aebersold et al. 
estimated the number of proteoforms being present in the human organism in the 
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range of approximately a billion [11]. Thus, very efficient purification steps prior to 
the TDMS are required to tackle the huge number of individual proteoforms in cells 
and tissues of body fluids. Beside the excessive number of individual proteoforms, 
their dynamic range is a further challenge. 


3. Analysis of proteoforms of recombinant therapeutic proteins: 
challenges 


Similar challenges are associated with recombinant therapeutic proteins. The 
importance of therapeutic proteins has been continually increasing over the past 
years [12, 13]. Currently, several types of therapeutic proteins [14] are available 
in the market including monoclonal antibodies (mAbs), erythropoietin (EPO), 
insulin, human growth hormone and many more. Therapeutic proteins market is 
dominated by the monoclonal antibodies with sales of approximately $123 billion 
in 2017 and will be seen increasing with the upcoming biosimilar market [13]. 
Therapeutic proteins possess several advantages over small molecule drugs due to 
their higher specificity towards drug targets, which are in most cases also proteins 
[15]. This makes therapeutic proteins able to target specific key steps in disease 
pathology [16]. 

This group of man-made proteins has presumably a significantly higher number 
of proteoforms per gene than proteoforms per gene in vivo, causing a huge number 
of proteoforms within a single recombinant therapeutic protein (rTP) product. The 
heterogeneity is developing during the production of an rTP mainly in the upstream 
processing. The first event increasing the heterogeneity is alternative splicing 
[17-19]. The second critical step is the protein biosynthesis at the ribosomes, in 
which errors can occur. Proteolytic cleavage may happen at any stage after the 
protein has left the ribosome, not only within the host cell, but also extracellularly, 
if host cell proteases have not been removed by purification of the target protein. 
Many therapeutic proteins like conventional monoclonal antibodies or erythropoi- 
etin [20] are posttranslationally modified by glycans. Especially, the glycan chains 
are adding an additional factor multiplying the heterogeneity of proteoforms. An 
example of a therapeutic glycoprotein is Etanercept, which is decorated with O- 
and N-glycans. Commercial preparations of Etanercept used as drugs show a very 
high degree of complexity [21]. It can be assumed that therapeutic fusion proteins 
applied to patients like etanercept are containing even hundreds of species, which 
differ in their exact composition of atoms. In addition to glycans, all other forms of 
posttranslational modifications are possible, depending on the nature of the protein 
and the type of the host cells and the upstream parameters. 

Why is the heterogeneity of recombinant therapeutic proteins much higher than the 
heterogeneity of gene products in-vivo? Host cells used for the production of recombi- 
nant therapeutic proteins are optimized to synthesize a large excess of recombinant 
proteins [22]. However, increasing the expression of proteins does not usually 
correlate to increase in the correctly processed bioactive form of the recombinant 
proteins [22]. Consequently, the probability is increasing, that these overexpressed 
recombinant proteins are underlying errors during synthesis, side reactions of 
enzymes and spontaneous chemical reactions. Asa result, the number of recom- 
binant species, which have a low quality, is much higher than in a native cell in an 
intact organism [23]. It was reported that overexpressing recombinant therapeutic 
proteins is also accompanied by an increase in high molecular weight aggregates 
and misfolded forms [24]. Thus, it can be assumed that the cellular systems, which 
usually remove low-quality or incorrectly processed proteins, are swamped by 
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these inadequate proteins [25] and thereby these species will not be processed in 
the cell or be eliminated. Beside the enzymatic reactions mainly taking place in the 
upstream-processing, chemical reactions which modify the recombinant therapeu- 
tic proteins, can occur during the whole production process including even the final 
product fill and finish or storage [26, 27]. A very common reaction is the oxidation 
of methionine, which can happen on nearly every stage of the production and can 
affect the efficacy of the product. 

Is any risk associated with the large number of species? Fortunately, severe side 
effects associated with species, which are not exactly identical with the target 
protein, have been reported very seldomly. An unfortunate case with dramatic 
consequences for a few patients was reported from Seidl et al. [28]. In this case, 
tungsten ions, a contamination which got into the glass vials during the produc- 
tion of the vials, induced the dimerization of erythropoietin. Asa result, a few 
patients developed autoantibodies against erythropoietin, thereby destroying the 
remaining cells in these patients, which were producing the native hormone. Since 
a therapy with erythropoietin was not possible any more, these patients had to get 
blood transfusions for survival. Non-human glycan structures bound to therapeutic 
proteins, which can occur when producing them in mouse cells, can induce hyper- 
sensitivity reactions [29, 30]. 

More common than severe side effects is the phenomenon that , showing even 
small differences in their composition of atoms compared with the target species, 
make the species less potent than the target species. For example, deamidation, 
causing a + 1 Da shift of the molecular weight, can decrease the efficacy of a 
therapeutic protein [31], as observed with recombinant human interleukin (rhIL)- 
15 [32]. Deamidation converts asparagine or glutamine to aspartic acid or glutamic 
acid, respectively. As a result, the polar, uncharged amides are changed into 
negatively charged carboxylic acids, impacting protein surface-charge density and 
surface hydrophobicity, thereby explaining the change of the efficacy of a thera- 
peutic protein. Deamidation of asparagine can occur spontaneously at physiological 
pH of 74 [32]. A further important modification of proteins is the disulfide bond 
(S-S), which is formed by the oxidation of thiol groups (SH) between two cysteine 
residues resulting in a covalent bond [33], which is decreasing the molecular weight 
of a protein by 2 Da. Disulfide-bonds have an impact on protein stability as well as 
on activities [33]. Duet al. stated that during the manufacturing process, extensive 
reduction of antibodies has been observed after harvest operation or Protein A 
affinity chromatography and multiple process parameters correlate to the extent of 
the reduction [34]. The topic “disulfide bonds of therapeutic proteins” is in depth 
discussed by Lakbub et al. [35]. 

More details about sources and effects of microheterogeneity are described in 
the excellent reviews of Beyer [36] and Ambrogelly [37]. 

How large are the differences of the individual proteoforms of a therapeutic protein? 
Proteofroms can vary in all chemical properties known, such as size, isoelectric 
points (pI) [38] and hydrophobicity [39]. The pls of recombinant erythropoietin 
varies from pH 3.5-6 [38, 40]. Therapeutic proteins are characterized by the 
presence of size variants arising from the manufacturing process or storage condi- 
tions when exposed to chemical, physical or conformational stress [41]. These size 
variants may include the N terminus clipped proteins, truncated forms, fragments 
representing sub molecular weight species or improperly assembled therapeutic 
proteins. The formation of dimers or multimers, in which more than two mono- 
mers are forming a complex, is a problem, which many therapeutic proteins are 
associated with [42]. Such aggregates can induce adverse immune responses in 
patients [43]. The proteoforms of recombinant erythropoietin are varying within 
a range of 4-6 kDa [20]. Beside these larger differences in size, the composition of 
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atoms of many proteoforms derived from one single gene can be very similar within 
subtypes of proteoforms such as the family of acidic proteoforms. As a result, the 
separation of charge variants by ion exchange usually is successful but the composi- 
tion within a single fraction might not only contain one single but also multiple 
proteoforms [44]. 


4. Separation of proteoforms of therapeutic proteins 


4.1 Separation of proteoforms of therapeutic proteins with liquid 
chromatography 


Liquid chromatography (LC) is the most common for purification and fraction- 
ation of therapeutic proteins [37]. The proteoforms are either separated by size- 
exclusion (SEC), making use of different path lengths through chromatographic 
particles related to the size of the proteins, or by adsorption chromatography. The 
latter is applying the principle of separation of molecules by their different veloci- 
ties during crossing a column filled with chromatographic particles. The velocities 
are proportional to the affinities of the molecules towards the stationary phase of 
the stationary phase. Depending on the chemistry of the functional groups of the 
stationary phase, different forms of liquid chromatography are possible based on 
adsorption to the stationary phase, highlighted in bold in Table 1. 

Table 1 is giving an overview about the different types of separation methods 
and their frequency of application with a focus on therapeutic proteins and in addi- 
tion with respect to proteoforms. The numbers of column 2 compared with column 
3 clearly show that the topic of proteoforms is not yet addressed very often. The 
selected reviews will give deeper insights into the different separation methods. 

Affinity chromatography using chromatographic material derivatized with 
protein-A is the most common and effective method for the purification of recom- 
binant monoclonal antibodies [45]. For the separation of proteoforms of recombi- 
nant monoclonal antibodies, it is not very relevant. 

Ion exchange chromatography (IEX): charge variants of therapeutic proteins such 
as acidic or basic species can be separated with ion exchange chromatography (IEX) 
[46]. IEX of proteins can be performed with oppositely charged ionic group on 
the stationary phase as either anion exchange or cation exchange chromatography. 
Elution buffers are decreasing electrostatic interactions of the proteins with IEX 
material thereby decreasing the affinity of the protein towards the stationary phase. 
Elution can be either pH or salt based [47]. Salt-based elution is used for IEX with 
ultra violet (UV) online detection. Coupling IEX directly with MS is only possible 
if the elution buffer system is volatile [48]. Acidic species are often related to PTM’s 
like sialic acid or deamidation on asparagine, while basic variants are formed by 
aspartate isomerization, succinimide formation, variants of C terminal lysine and 
N terminal glutamine [49]. IEX is giving relative quantitative information about 
charge variants which can be important for the qualification of manufacturing 
batches [50]. 

Hydroxyapatite-chromatography (HAP) is based on a material consisting of the 
crystals of calcium hydroxyapatite, described by the formula Cas(PO,)3(OH). 

HAP can be described as mixed-mode chromatography. The Ca” —ions can act via 
electrostatic interactions as anion-exchanger. Also, metal coordination bonds of 
carboxylic groups can be formed with the Ca” —ions. With the anionic phosphate 
groups of HAP, positive-charged molecules will be adsorbed by electrostatic 
interactions. Phosphate-, chloride-ion-, and calcium-ion- gradients are common 
as well as multi-component gradients [39]. Therefore, finding appropriate eluents 
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Separation method Hits of the PubMed search: Hits after adding Review 

monoclonal recombinant filter: “AND 

antibody OR therapeutic- (isoform OR 

proteins OR biotherapeutics variant OR 
AND “name of the separation species OR 
method: left column” proteoform)” 

Affinity chromatography (AF) 1046 189 [45, 100] 
Anion exchange 80 20 [101, 102] 
chromatography (AEC) 
Cation exchange 56 23 [49, 103] 
chromatography (CEX) 
Hydroxyapatite 6 i [104] 
chromatography (HAP) 
Hydrophobic interaction 46 10 [39] 
chromatography (HIC) 
Hydrophilic interaction 8 3 [51, 52] 
chromatography (HILIC) 
Immobilized metal affinity 59 5 [105, 106] 
chromatography (IMAC) 
Mixed mode 10 0 [59] 
chromatography (MM) 
Reversed phase 170 41 [107] 
chromatography (RP) 
Size exclusion 973 142 [108] 
chromatography (SEC) 
Liquid chromatography (LC) 1969 308 [109] 
Two dimensional liquid 11 2 [110, 111] 
chromatography (2D-LC) 
Capillary electrophoresis (CE) 151 31 [112] 
Two dimensional 12 4 [38, 113] 
electrophoresis (2DE) 


The second column is listing the hits got by screening the knowledgebase PubMed (screening date 24.07.2019 

8:00 pm) with the search terms: “monoclonal recombinant antibody OR therapeutic-proteins OR biotherapeutics 
AND anion-exchange-Chromatography”. The third column is presenting the hits after adding the filter: “AND 
(isoform OR variant OR species OR proteoform)”. In bold: Forms of liquid chromatography based on adsorption. 


Table 1. 
Overview about methods for the separation of therapeutic proteins showing the frequency of their application 
and recommended reviews. 


is more difficult than with anion-exchange chromatography. However, screening 
systematically appropriate parameters of eluent systems should offer the chance to 
separate proteoforms. As indicated in Table 1, HAP is not very often applied for the 
chromatography of therapeutic proteins, which may be associated with the fact that 
it is more complex to find optimal elution systems. 

Hydrophilic interaction chromatography (HILIC) is making use of high affinities 
of polar and hydrophilic molecules to hydrophilic stationary phase [51, 52]. Usually 
the sample application buffer has a high content (>80%) of an organic solvent 
like acetonitrile. Thus, it is working well for glycans. However, proteins under 
these conditions may precipitate. If proteoforms will not precipitate, HILIC is an 
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interesting alternative for other forms of adsorption chromatography, especially, if 
precipitation proteoforms will be removed from the proteoforms of interest. 

Hydrophobic interaction chromatography (HIC) is yet another method which can 
be used for separating different proteoforms of a therapeutic protein. These separa- 
tions rely on the varying hydrophobicity profiles due to change in conformation of 
the protein HIC separations use reverse salt gradients and can operate in nondena- 
turing mode [53]. HIC was presented as a reliable method for monitoring oxidation 
of tryptophan residues in complement determining region (CDR) of recombinant 
mAbs [54]. HIC is effective in resolving the proteoforms of antibody drug conju- 
gates varying in drug to antibody ratio [55]. Charge variants coeluting with IEX 
can be resolved with HIC in the second dimension of separation. Douglas and 
colleagues demonstrated the separation of carboxy terminal variants, isomerization 
variants with HIC which could not be resolved at the IEX level [56]. Quantitative 
information of the succinimide variants was given by HIC with TSKgel butyl-NPR 
column [57]. Similar application can be also found in detection of impaired disul- 
fide bonding. Typical HIC buffers like ammonium sulfate are requiring desalting 
of the proteins prior to the MS [58]. Recently, direct coupling of HIC with MS for 
detailed characterization of mAbs was demonstrated by applying a volatile ammo- 
nium acetate buffer [53]. 

Immobilized metal-affinity-chromatography (IMAC) is widely used for enriching 
recombinant proteins with histidine tags from a protein extract from host cells. For 
production of therapeutic proteins, IMAC is not very often used, because metal 
ions are bleeding into the product. Metal ions like nickel or copper are critical for 
patients. For the separation of subgroups of proteoforms for analytical purposes 
also IMAC is an option. 

Mixed mode chromatography (MM) is performed with stationary phases which 
consist of at least two different functional groups [59], like hydroxy apatite (see 
above). Consequently, a MM material offers two or more types of chromatography. 
HAP is combining anion exchange (AEX), cation exchange (CEX) and IMAC. Also, 
with SEC mixed mode chromatography is possible, as described by Schliiter 
et al. [60]. In that study the electrostatic interaction induced by anionic sugars, 
which are part of a dextran polymer, were used to separate vanillylmandelic acid, 
glycine and phenylalanine from each other with a SEC column, which is usually 
applied for the separation of proteins in the range of 10-100 kDa. Mixed mode 
chromatography is not very often described for the chromatography of therapeutic 
proteins (Table 1), but it has a huge potential for the separation of proteoforms. 
For successful separations a rational screening of appropriate parameters is 
recommended. 

Size exclusion chromatography (SEC) is a gold standard for monitoring the pres- 
ence of aggregates of therapeutic proteins. SEC uses porous stationary phase mate- 
rial wherein the size variants are separated based on the differential access to the 
pores of the SEC material resulting in different path lengths in relationship to the 
size [61, 62]. SEC is effectively separating low molecular weight and high molecular 
weight species in mAbs [63]. SEC has found many applications like stability test- 
ing [64], quality control during manufacturing [65], in depth characterization of 
antibody-drug-conjugates (ADC’s) [66] and assessing aggregate content in biosimi- 
larity studies [67]. However, resolution of SEC is rather poor to clearly distinguish 
individual size variants. Non-specific adsorption to the SEC material can result in 
peak broadening thereby decreasing resolution. This problem may be minimized 
by use of organic modifiers in mobile phase or adjusting the pH in relation to the 
pl of therapeutic protein [61]. Advances in the chemistries of stationary phases 
incorporating very small core-shell particles or the use of sub-micron particles are 
improving the resolution of SEC columns [61]. 
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Reversed phase liquid chromatography (RPLC) mainly exploits the differences in 
hydrophobic properties of molecules for their separation. Sample application onto 
RPLC columns is performed with eluents having a high content of water, support- 
ing a high affinity of the molecules in the sample towards the stationary phase, 
which is hydrophobic. Elution is achieved with gradients increasing the concentra- 
tion of organic solvents in the eluent. Coupling RPLC with the high sensitivity 
detectors can provide qualitative and quantitative information of the cleaved, 
modified proteoforms along with main form [68]. Ambrogelly et al. reported RPLC 
as a method giving a first-hand check of the product quality to help in optimizing 
the purification strategy [69]. When coupled to high resolution mass spectromet- 
ric detection, RPLC also allows distinction of the major glycoforms. More than 
a decade ago, Dillion presented RPLC not only for determining the intact mAb 
glycosylation profile but only with the use of high temperature and organic solvents 
with high eluotropic strength coefficients [70]. Many advancement to conventional 
RPLC columns have come up in recent times to improve the separation of large 
therapeutic proteins at milder conditions [71]. 

The major concern in the use of RPLC for protein separations is the presence 
of organic solvents, which may precipitate proteins. Since precipitation will occur 
on the column, it is very difficult to recognize. In the case of proteoforms, it can 
be assumed that some may be more prone to precipitation than others. As a result, 
the chromatogram, in which signals from some but not all proteoforms are present, 
may be misinterpreted since the chromatogram is giving no information about the 
proteoforms which got lost by precipitation. TDMS protocols often apply RPLC 
for the analysis of proteoforms, because those species, which elute, are present in a 
liquid, which is optimal for electrospray ionization (ESI). Because of the problem 
with precipitation of proteins in RPLC in all TDMS approaches the question is how 
representative the TDMS chromatogram is regarding the original composition of 
proteoforms or vice versa how many proteoforms got lost during RPLC. 

Elution modes of liquid chromatography: beside the different types of stationary 
phases, different elution modes are existing, which have an impact on the separation 
of molecules, namely isocratic elution, gradient elution (GE) and displacement elu- 
tion (DE). DE is typically using the same sample application buffer and adsorption 
chromatography materials as gradient elution. In contrast to GE, DE is not using a 
salt gradient with an increasing concentration of a salt having a low affinity towards 
the stationary phase, but the elution buffer of DE is consisting of the sample 
application buffer, into which the displacer is added. The displacer ideally should 
have an affinity to the stationary phase higher than any of the sample components. 
After the sample application onto the column is finished, the eluent containing the 
displacer is immediately pumped onto the column. At the beginning, the displacer 
molecules are binding strongly to the top of the column, thereby displacing the 
sample component with the highest affinity. These sample components then dis- 
place the sample components with a lower affinity and so on. By this process, bands 
are formed moving down the column, driven by the displacer. The DE is finished, as 
soon as the displacer has saturated the stationary phase of the column completely. 
Within a band a high purity of the component is achieved [72]. DE has been shown 
to be suitable for separation of complex mixture of tryptic peptides [73-75] and 
proteins [76-79]. One of the characteristics of DE is that DE has a different selectiv- 
ity compared with GE [77]. This is one important argument for using DE for the 
separation of proteoforms. Thus, it is not surprising, that DE has been applied to the 
separation of proteoforms of therapeutic proteins successfully [46, 80-84]. 

Rational screening of parameters of liquid chromatography is recommended for 
optimal results of the separation of proteoforms. The first method describing 
multi-parallel high-throughput screening for parameters of liquid chromatography 
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was published 2002 by the group of Cramer [85]. In this case, the authors screened 
for displacers for ion-exchange systems. In the following year the group reported 

a multi-parallel high-throughput screening for displacers based on batch chro- 
matography [86]. Thiemann et al. published a similar approach termed protein- 
purification parameter screening system (PPS), which was not focusing on the 
identification of appropriate displacers but more general on any kind of parameters 
for adsorption chromatography, independent of the elution mode [87]. The PPS 
was successfully applied for purification and identification of an angiotensin-II 
generating enzyme [88], and for screening for parameters for optimal displace- 
ment chromatography of proteins [78, 79]. Rational screening was also used for 
developing a displacement chromatography of proteoforms of a recombinant 
protein with HIC [89]. 


4.2 Separation of proteoforms of therapeutic proteins with capillary 
electrophoresis 


Compared with liquid chromatography, capillary electrophoresis (CE) offers 
better resolving power. CE techniques such as capillary zone electrophoresis (CZE), 
capillary gel electrophoresis (CGE) and capillary isoelectric focusing (CIEF) have 
been adapted for the separation and characterization of proteins [90, 91]. These 
are basic techniques routinely used for quality control [91]. With CGE, the size of 
proteins is characterized, while in CIEF, proteins are separated according to their 
isoelectric point (pI). CIEF is using pH gradients formed by carrier ampholytes in a 
capillary [92]. It is important to note that pH plays a major role in CZE and should 
be well maintained [93]. Considerable protein adsorption must be considered when 
performing CIEF and CZE. The interaction of the analytes with the surface of the 
capillary may compromise the resolution, peak widths and shapes when using con- 
ventional bare fused-silica capillaries. Minimizing adsorption can be done by using 
better coating material or using reagents that reduce adsorption [94]. A penetrated 
surface layer protein A from bacteria was reported as capillary coating. The coating 
could be used for over 100 injections without loss of separation performance [95]. 
Another study reported that adsorption still happened when using LPA-coated 
capillary [96]. 

CZE and CIEF are more often used for separations of charge variants induced by 
C-terminal lysine truncation, N-terminal pyroglutamate formation, sialylation and 
deamidation [97]. 

The direct coupling of CE with MS is technically challenging regarding the 
CE-MS interface [98]. A study demonstrated a successful attempt to directly couple 
CIEF with mass spectrometry for characterization of transtuzumab, bevacizumab, 
cetuzimab and infliximab by optimizing the reagent, liquid composition and 
enhanced sample mixture by glycerol to reduce non-CIEF electrophoretic mobility 
and band broadening [99]. A CZE method was developed for the intact analysis of 
recombinant human interferon-f1 (rhIFN-B1). The charged species due to deamida- 
tion and sialylation were sufficiently separated. In contrast to dynamic polymeric 
coatings, such as polybrene or hydroxypropyl-methylcellulose, they covalently 
coated the bare-fused silica capillary with cross-linked polyethyleneimine (CPEI) 
to get positively charged surface, thus reducing the possibility of protein interaction 
with the coating. They then coupled this CZE to ESI-MS/MS and identified 138 
proteoforms, of which, 55 were quantified. 

For the in-depth characterization of the composition of proteoforms of a 
therapeutic protein CE online-coupled to MS is a good option, if prior to the CE, 
the mixture of proteoforms has already been fractionated by LC using separation 
mechanisms orthogonal to the CE separation mechanism. 
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5. Conclusion 


A huge progress has been made in the field of TDMS, allowing the identification 
and comprehensive analysis of the composition of atoms of proteoforms, especially 
if they are smaller than 30 kDa. TDMS analysis of larger proteoforms still is more 
challenging. However, until today the most critical point is the purification of a pro- 
teoform towards near homogeneity or at least the significant reduction of complex- 
ity of the sample, which is desorbed and ionized into a tandem mass spectrometer 
for TDMS. A low complexity of the composition of a protein mixture entering the 
MS still is mandatory for getting high quality spectra. Thus, efficient separation 
methods are needed for obtaining fractions with low complexity. For developing 
strategies for separating proteoforms, therapeutic proteins are well suited, however 
challenging because of their heterogeneity. In depth separation of the proteoforms 
of a therapeutic protein requires the combination of fractionation techniques based 
on orthogonal mechanisms. In addition, the combination of gradient chromatogra- 
phy and displacement chromatography will add further opportunities for successful 
separations. 
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